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Abstract 

Sphere decoding achieves maximum-UkeUhood (ML) performance at the cost of exponential complex- 
ity; lattice reduction-aided successive interference cancelation (SIC) significantly reduces the decoding 
complexity, but exhibits a widening gap to ML performance as the dimension increases. To bridge the 
gap between them, this paper presents randomized lattice decoding based on Klein's sampling technique, 
which is a randomized version of Babai's nearest plane algorithm (i.e., SIC). To find the closest lattice 
point, Klein's algorithm is used to sample some lattice points and the closest among those samples is 
chosen. Lattice reduction increases the probability of finding the closest lattice point, and only needs 
to be run once during pre-processing. Further, the sampling can operate very efficiently in parallel. The 
technical contribution of this paper is two-fold: we analyze and optimize the performance of randomized 
lattice decoding resulting in reduced decoding complexity, and propose a very efficient implementation of 
random rounding. Simulation results demonstrate near-ML performance achieved by a moderate number 
of samples, when the dimension is not too large. Compared to existing decoders, a salient feature 
of randomized lattice decoding is that it will sample a closer lattice point with higher probability. A 
byproduct is that boundary errors for finite constellations can be partially compensated if we discard the 
samples falling outside of the constellation. 

L INTRODUCTION 

Decoding for the linear multi-input multi-output (MIMO) channel is a problem of high relevance in 
multi-antenna, cooperative and other multi-terminal communication systems. The computational com- 
plexity associated with maximum-likelihood (ML) decoding poses significant challenges for hardware 
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implementation. When the codebook forms a lattice, ML decoding corresponds to solving the closest 
lattice vector problem (CVP). The worst-case complexity for solving the CVP optimally for generic 
lattices is non-deterministic polynomial-time (NP)-hard. The best CVP algorithms to date are Kannan's 
|[T1 which has be shown to be of complexity ri"/2+o(n) ^vhere n is the lattice dimension (see Q) and whose 
space requirement is polynomial in n, and the recent algorithm by Micciancio and Voulgaris f3l which 
has complexity 2*^^") with respect to both time and space. In digital communications, a finite subset of 
the lattice is used due to the power constraint. ML decoding for a finite lattice can be realized efficiently 
by sphere decoding IH, iQ, @, whose average complexity grows exponentially with n for any fixed 
SNR Q. This limits sphere decoding to low dimensions. The decoding complexity is especially felt in 
coded systems. For instance, to decode the 4x4 perfect code fS], one has to search in a 32-dimensional 
(real-valued) lattice. The state-of-the-art sphere decoding is slow for this dimension. Although some 
fast-decodable codes have been proposed recently ||9l, the decoding still relies on sphere decoding. 

Thus, we often have to resort to an approximate solution. The problem of solving CVP approximately 
was first addressed by Babai in [ 10|, which in essence applies zero-forcing (ZF) or successive interference 
cancelation (SIC) on a reduced lattice. This technique is referred to as lattice-reduction-aided decoding 
inn . |[T2l . It is known that Lenstra, Lenstra and Lovasz (LLL) reduction achieves full diversity in MIMO 
fading channels |[T3l . llT4l and that lattice-reduction-aided decoding has constant gap to (infinite) lattice 
decoding 115]. It was further shown in |[T6l that minimum mean square error (MMSE)-based lattice- 
reduction aided decoding achieves the optimal diversity and multiplexing tradeoff. In ifTTl . it was shown 
that Babai's decoding using MMSE can provide near-ML performance for small-size MIMO systems. 
However, the analysis in ITSl revealed a widening gap to lattice decoding. Thus, for high dimensional 
system and high-level modulation such as 64-QAM, the performance loss relative to ML is still large. 

In this work, we present randomized lattice decoding to narrow down the gap between lattice-reduction- 
aided SIC and sphere decoding. We use Klein's randomized CVP algorithm lITSl . which is a randomized 
version of Babai's nearest plane algorithm (i.e., SIC). The core of Klein's algorithm is randomized 
rounding which generalizes the standard rounding by not necessarily rounding to the nearest integer. 
Thus far, Klein's algorithm has mostly remains a theoretic tool in the lattice literature, and we are 
unaware of any experimental work for Klein's algorithm in the MIMO literature. In this paper, we sample 
some lattice points by using Klein's algorithm and choose the closest from the list of sampled lattice 
points. By varying the list size K, it enjoys a flexible tradeoff between complexity and performance. 
It is worth noting that Klein applied his algorithm to find the closest lattice point only when it is 
very close to the input vector. We do not have this restriction in this paper, although in essence it is 
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also a probabilistic bounded-distance decoder. The technical contribution of this paper is two-fold: we 
analyze and optimize the performance of randomized lattice decoding which leads to reduced decoding 
complexity, and propose a very efficient implementation of Klein's random rounding. Simulation results 
demonstrate near-ML performance achieved by a moderate number of samples when the dimension is 
not too large. The performance-complexity tradeoff of randomized lattice decoding is comparable to that 
of the new decoding algorithms proposed in |[T9l . ll20l very recently. 

Randomized lattice decoding distinguishes itself from previous list-based detectors ||2T1 . Il22l . ||23]| . Il24ll 
in several ways. Firstly, the way it builds its list is distinct. More precisely, it randomly samples lattice 
points with a discrete distribution centered at the received signal and returns the closest among them. 
Hence, random lattice decoding is more likely to find the closest lattice point than ll24l where a list of 
candidate lattice points is built in the vicinity of the SIC output. Secondly, the expensive lattice reduction 
is only performed once during pre-processing, which means that the extra complexity is 0{Kv?) in 
addition to that of lattice reduction. In [22], a bank of 2n parallel lattice reduction-aided detectors was 
used. The coset-based lattice detection scheme in ||23l also needs lattice reduction many times. Thirdly, 
randomized lattice decoding enjoys a proven gain given the list size K; all previous schemes might 
be viewed as various heuristics apparently without such proven gains. Note that list-based detectors 
(including our algorithm) may prove useful in the context of incremental lattice decoding ||25l . as it 
provides a fall-back strategy when SIC starts failing due to the variation of the lattice. 

It is worth mentioning that Klein's sampling techique is emerging as a fundamental building block in 
a number of new lattice algorithms ||26]| . ||27]| . Thus, our analysis and implementation may benefit those 
algorithms as well. 

The paper is organized as follows: Section II presents the transmission model and lattice decoding, 
followed by a description of Klein's randomized decoding algorithm in Section III. In Section IV the 
fine-tuning and analysis of Klein's decoding is given, and the efficient implementation and extensions to 
complex-valued systems and MMSE are proposed in Section V. Section VI evaluates the performance 
and complexity by computer simulation. Some concluding remarks are offered in Section VII. 

Notation: Matrices and column vectors are denoted by upper and lowercase boldface letters (unless 
otherwise stated), and the transpose, inverse, pseudoinverse of a matrix B by B^, B~^, and B^^, respec- 
tively. The inner product in the Euclidean space between vectors u and v is defined as (u, v) = u^v, 
and the Euclidean length ||u|| = (u, u). \x\ rounds to a closest integer, while [xj to the closest integer 
smaller than or equal to x. The 3? and 9 prefixes denote the real and imaginary parts. We use the standard 
big and small O notation O(-) and o(-). 
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II. LATTICE CODING AND DECODING 
Consider an nx x n/j flat-fading MIMO system model consisting of nx transmitters and nji receivers 

Y = HX + N, (1) 

where X G C"^^^, Y, N G (J^^hxT block length T denote the channel input, output and noise, 
respectively, and H G ([^nnxnT jg (-j^e x n-p full-rank channel gain matrix with nji > nj-, all of its 
elements are i.i.d. complex Gaussian random variables CAA(0, 1). The entries of N are i.i.d. complex 
Gaussian with variance o"^ each. The codewords X satisfy the average power constraint £'[||X||p/r] = 1. 
Hence, the signal-to-noise ratio (SNR) at each receive antenna is l/o"^. 

When a lattice space-time block code is employed, the codeword X is obtained by forming a n-p x T 
matrix from vector s G C"^^, where s is obtained by multiplying n-pT x 1 QAM vector x by generator 
matrix G of the encoding lattice, i.e., s = Gx. By column-by-column vectorization of the matrices Y 
and N in ([T]), i.e., y = Vec(Y) and n = Vec(N), the received signal at the destination can be expressed 
as 



y = (It «> H) Gx + n. (2) 

When T = 1 and G = In^ , @ reduces to the model for uncoded MIMO communication y = Hx + n. 
Further, we can equivalently write 
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which gives an equivalent 2nT x 2n r real- valued model. We can also obtain an equivalent 2nTT x 2n j^T 
real model for coded MIMO like (O. The QAM constellations C can be interpreted as the shift and scaled 
version of a finite subset of the integer lattice Z"^, i.e., C = a{A"^ + [1/2, 1/2]^), where the 
factor a arises from energy normalization. For example, we have A^'^ = {—y/M /2, \/M /2 — 1} for 
M-QAM signaUing. 

Therefore, with scaling and shifting, we consider the generic nxm (m > n) real-valued MIMO system 
model 

y = Bx + n (4) 

where B G M™^", given by the real-valued equivalent of (1^ (g) H) G, can be interpreted as the basis 
matrix of the decoding lattice. Obviously, n = 2nTT and m = 2nRT. The data vector x is drawn from 
a finite subset to satisfy the power constraint. 
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A lattice in the m-dimensional Euclidean space M"^ is generated as the integer linear combination of 
the set of linearly independent vectors 1281 . f29\ : 

£ 4 £ (B) = 1^ J^jbi |xi G Z, i = 1, . . . n I , (5) 

where Z is the set of integers, and B = [bi • • • b„] represents a basis of the lattice C. In the matrix form, 
C = {Bx : X G Z"}. The lattice can have infinitely many different bases other than B. In general, a 
matrix B = BU, where U is an unimodular matrix, i.e., det U = ±1 and all elements of U are integers, 
is also a basis of £. 

Since the vector Bx can be viewed as a lattice point, MIMO decoding can be formulated as a lattice 
decoding problem. The ML decoder computes 

X = arg min lly — BxlP. (6) 

xe.4" 

which amounts to solving a closest-vector problem (CVP) in a finite subset of lattice C. Note that 
the complexity of the standard ML decoding that uses exhaustive search is exponential in n, and also 
increases with the alphabet size. ML decoding may be accomplished by the sphere decoding. However, 
the expected complexity of sphere decoding is exponential for fixed SNR |7|. 

A promising approach to reducing the computational complexity of sphere decoding is to relax the 
finite lattice to the infinite lattice and to solve 

X = arg min ||y — Bx|p. (7) 

xgZ" 

which could benefit from lattice reduction. The downside is that the found lattice point will not necessarily 
be a valid point in the constellation. 

This search can be carried out more efficiently by lattice reduction-aided decoding ifTll . The basic 
idea behind this is to use lattice reduction in conjunction with traditional low-complexity decoders. With 
lattice reduction, the basis B is transformed into a new basis consisting of roughly orthogonal vectors 

B' = BU (8) 

where U is a unimodular matrix. Indeed, we have the equivalent channel model 

y = B'U~^x + n = B x' + n, x' = U^x. 

Then conventional decoders (ZF or SIC) are applied on the reduced basis. This estimate is then trans- 
formed back into x = Ux'. Since the equivalent channel is much more likely to be well-conditioned. 
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the effect of noise enhancement will be moderated. Again, the resulting estimate x is not necessarily in 
remapping of x onto the finite lattice A'^ is required whenever x ^ A"'. 
Babai pre-processed the basis with lattice reduction, then applied either the rounding off (i.e., ZF) or 
nearest plane algorithm (i.e., SIC) |10|. For SIC, one performs the QR decomposition B = QR, where 
Q has orthogonal columns and R is an upper triangular matrix ||30l . Multiplying (01) on the left with Q^" 
we have 

y' = Qty = Rx + n'. (9) 

In SIC, the last symbol x„ is estimated first as = \y'n/'^n,n\- Then the estimate is substituted to 
remove the interference term in y'n-i when is being estimated. The procedure is continued until 
the first symbol is detected. That is, we have the following recursion: 



(10) 



for i = n,n — 1, 1. 

It is known that SIC finds the closest vector if the distance from input vector y to the lattice £ is less 
than half the length of the shortest Gram-Schmidt vector. In other words, for SIC the minimum distance 
from a lattice point to the boundary of the decision region is given by 

C^min.SIC = l; l|bj||- (11) 

Z l<i<n 

Here the Gram-Schmidt vectors corresponding to a basis bi,...,b„ are the vectors bi,...,b„ where bj is 
the projection of bj orthogonal to the vector space generated by bi,...,bj_i. These are the vectors found 
by the Gram-Schmidt algorithm for orthogonalization. 

In order to quantify the worst-case loss in the minimum squared Euclidean distance relative to infinite 
lattice decoding (ILD), ifTSl defined the proximity factor for SIC 

PF=4^Hiit£, (12) 

'^min ,SIC 

and proved that under LLL reduction 

PF</3", ^ = {6-1/4)-'^ (13) 

where l/4<(5<lisa parameter associated with LLL reduction lISTl . Meanwhile, if one applies dual 
KZ reduction, then iflSl 

PF < n^. (14) 

Obviously, the gap to optimum decoding widens with n, although SIC has very low complexity 0{r?) 
excluding the QR decomposition. 
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TABLE I 

Pseudocode for Klein's algorithm in recursive form 

Function Klein^(y, i) 
1 : if i = tlien 
2: return y 
3: else 

4; Let r^bi be the projection of y in the direction of hi 

5: = yl||b,||2 

6: = Rand_Roundc; (ri) 

7: y' = y + {xi - ri)h, 

8: return Kleins (y' — Xihi, i — 1) + Xihi 

9: end if 



III. RANDOMIZED LATTICE DECODING 
Klein ifTSl proposed a randomized algorithm that pushed up the minimum distance to 

dmin,Klein = k mm ||bj||. 

l<i<n 

The parameter k could take any value, but it was only useful when 1/2 < k < y/n/2, since in other 
regions Babai and Kannan's algorithms would be more efficient. Its complexity is n^^"*"*^'^^) which is 
polynomial for fixed k. Klein described his randomized algorithm in the recursive form, shown in Table I. 

In essence, Klein's algorithm is a randomized version of SIC, where standard rounding in SIC is 
replaced by randomized rounding. Here, we rewrite it into the non-recursive form more familiar to the 
communications community. It is summarized by the pseudocode of the function Rand_SICyi (y') in 
Table II. Rather than returning a vector in the lattice as in Table I, it returns the data estimate x. We 
also assume that the pre-processing of Q has been done, hence the input y' = Q^y rather than y. This 
will reduce the complexity since we will call it many times. 

The randomized SIC randomly samples a lattice point z that is close to y. To obtain the closest lattice 
point, one calls Rand_SIC K times and chooses the closest among those lattice points returned, with a 
large K. The function Rand_Roundc(r') rounds r randomly to an integer Q according to the following 
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TABLE 11 

Pseudocode for the randomized SIC in sequential form 

Function Rand_SICA (y') 
1: for i = n to 1 do 

2: Ci i Ar^ i 

3: Xi i — Rand_Roundci (^{y'i — ri,jXj)/ri,i^ 
4: end for 
5: return x 



discrete Gaussian distribution ITSl 

oo 

P{Q = q) = e-<''-'^'>"/s, s= e"'^""^'^". (15) 

q= — oo 

If c is large, Rand_Round reduces to standard rounding (i.e., decision is confident); if c is small, it make 
a guess (i.e., decision is unconfident). 
Lemma 1: (HBl) s < s{c) = Y.i>o e'"'' + e-^(i+^)'. 

The proof of the lemma was given in lITSl and is omitted here. The next lemma states the probability 
that Klein's algorithm or Rand_SIC returns z G £. 

Lemma 2: ( ifTSl ) Let z be a vector in C (B) and y be a vector in M"*. The probability that Klein's 
algorithm or Rand_SIC return z is bounded by 

P(z) > e-^lly-^ll'. (16) 

nr=iK^iib.p) 



Proof: The proof of the lemma was given in 11181 for the recursive Klein algorithm in Table I. Here, 
we give a more straightforward proof for Rand_SIC. Let z =^ibi + . . . + ^„b„ = G £, G Z and 
consider the invocation of Rand_SIC^ (y'). Using Lemma 1 and (ITSl ). the probability of Xi = is at 
least 

\ II ^\\ ) ^-^^^ 



s{A\\h,r) 

as Ti^i = ||bj||. By multiplying these n probabilities, we obtain a lower bound on the probability that 
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Rand SIC returns z 



p(z) > e-Aj:7=M-j:u+ir,A,T 

V / — T-r ,|2N 



R.<n4A\\h, 
1 



-e 



-A\\y'-Ra 



(18) 



i g-A||y-B$|| 



n^<nS{A\\h,r) 

So the probability is as stated in Lemma 2. ■ 
A salient feature of ([T6l) is that the closest lattice point is the most likely to be sampled. Particularly, 

it resembles the Gaussian distribution. The closer z is to y, the more likely it will be sampled. 
Klein suggested A = logn/ minj ||bj|p and showed that the probability of returning z G £ is 

P(z) = 0(n-lly-^ll'/min. Ilb.f (.^9) 

The significance of lattice reduction can be seen here, as increasing mirij ||bj|p will increase the proba- 
bihty ^9^. 

As lattice reduction-aided decoding normally ignores the boundary of the constellation, the samples 
returned by Rand_SICA(y') come from an extended version of the original constellation. In the final step, 
we need to remove those samples that happen to lie outside the boundary of the original constellation 
and choose the closest among the rest lattice points. 

IV. ANALYSIS AND OPTIMIZATION 

The list size K is often limited in communications. Given K, Klein's choice the parameter A = 
logn/ mirij ||bj|p is not necessarily optimum. In this Section, we want to answer the following questions 
about randomized lattice decoding: 

• Given K, what is the optimum value of A7 

• Given K and associated optimum A, how much is the gain in decoding performance? 

• What is the limit of randomized lattice decoding? 

Indeed, there exists an optimum value of A when K is finite, since A ^ means uniform sampling 
of the entire lattice while ^4 — )• oo means Babai's algorithm. We shall present an approximate analysis 
of optimum A for a given K in the sense of maximizing the minimum distance (imin.Kiein> and then 
estimate the decoding gain. The analysis is not exact since it is based on the minimum distance only; 
nonetheless, it serves as a useful guideline to determine these parameters in practical implementation of 
Klein's algorithm. 



Februaiy 27, 2010 



DRAFT 



9 



A. Optimum Parameter A 

The choice of parameter A has a significant impact on the probability Rand_SIC returns z G £. Let 
A = logp/ miiij ||bi|p, where p > 1 (so that A > 0) is the parameter that is to be optimized. Then we 
have Cj > log p. We use this bound to estimate s(cj): 

= l + 2(p-^ + p-4 + p-9 + ...) 

= 1 + 2/p + O (p-^) . (20) 

Hence 

n 



n^(c«) < (exp(2/p + 0(p-^)))" 

= eT(i+°«). (21) 
With this choice of parameter A, ( [T6l ) is lower-bounded by 

P(z) > e"^(^+°(^)) • p-lly-z|IVmmi<.<„ ||b.|p_ (^22) 

Now, let 'Lk be a point in the lattice, with P{zk) > 1/K. With K calls to Klein's algorithm, the 
probability of missing zk is not larger than (1 — l/K)^ < 1/e. Therefore, any such lattice point zk is 
found with probability > 1 — 1/e. From (l22l ). we obtain 

g-^(l+o(l)) . ^-||y-z,,||Vmini<.<„ ||b,|P _ J__ (-23) 

ii' calls to Rand_SIC can find the closest vector point if the distance from input vector y to the lattice is 
less than ||y— Zi^|| (such a point z/^ G £ may not exist if K is too small.) Of course, only minz^, ||y— z^^ || 
matters when there are more than one such lattice point. In this sense, minz^^ ||y — Zi^-H can be thought 
of as the bounded distance of Rand_SIC. We point out that ||y — z^-H itself is not exactly the minimum 
distance, and it could be larger than dmin,ML> the minimum distance of the ML decoder, but we are mostly 
interested in the case ||y — zk\\ < c?min,ML for complexity reasons. Moreover, ||y — zk\\ gives a tractable 
measure to optimize. For this reason, defining the pseudo minimum distance iimm,Random — ||y — ||, 
we can derive from (1231 ) 

dmin,Random - m.^ \\Uf log^ (i^e-^n/pA _ (24) 
l<i<n \ J 
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It is natural that p is chosen to maximize the value of d^jj^ Random ^^'^ ^^^^ decoding performance. Let 
the derivative of the right side of (|24l) with respect to p be zero: 



d{-) . ||2 2n 2n log^ 

- mm ||bi||^ h — — 5 — ^ = 0. (25) 



dp i<i<n \p^\ogp p'^log p plog p 

Because p > 1, we have 

2n 

log K = — log ep. (26) 
P 

Consequently, the optimum p can be determined from the following equation 

K = (epo)'"/"° • (27) 

By substituting ( |27l ) back into (l24l ). we get 

/ 2n 

C?mm,Random ^ 4 / mill ||bi||. (28) 

y Po i<«<" 

To further see the relation between Pq and K, we calculate the derivative of the function / {p) = 
(ep)"^"^^, p > I with respect to p. It follows that 

2n 

log f{p) = — log ep 
P 

d{f{p)) In 2n 

I KP) dp p^ p^ 

2n, 

= 2^°SP- 



Hence 



p- 



d{f{p)) 

- -/(Pj— logp 



dp p 
2n 



til i^epfn/ nog p, p>\ 



P 

< 0. 

Therefore, / (p) = {ep]^""^^ is a monotonically decreasing function when p > 1. Then, we can check that 
a large value of A is required for a small list size K, while A has to be decreased for a large list size 
K. It is easy to see that Klein's choice of parameter A, i.e., p = n, is only optimum when K ^ (en)^. 
If we choose K < (en)^ to reduce the implementation complexity, then p^ > n. 

Fig. 1 shows the bit error rate against log p for decoding a 10 x 10 (i.e., ut = = 10) uncoded MIMO 
system with K = 20, when Eh/No = 19 dB. It can be derived from ( [27] ) that logpg = 4.27. Simulation 
results confirm the choice of the optimal p offered by (l27l ) with the aim of maximizing cimin,Raiidom- 
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Fig. 1. BER vs. logp for a 10 x 10 uncoded system using 64-QAM, K = 20 and SNR per bit = 19 dB. 

B. Effect of K on Performance Gain 

We shall determine performance gain of randomized lattice decoding over Babai's decoding. Following 
lITSl . we see than the gain in squared minimum distance 

min, Random 

~j2 ■ 

'^min.SIC 

Since dmin.Random is just the pscudo minimum distance, this estimate of G can be optimistic. From ([TT]) 
and ( |28l) . we get 

G < 8n/po, Po>l- (29) 

By substituting ( [291 ) in ( |27l ). we have 

K > {8en/Gf^^ , G < 8n. (30) 

Equation ( [30l ) reveals the relation between G and K. Larger G requires larger K. For fixed performance 
gain G, the computational complexity of randomized lattice decoding is polynomial in n. 

Table JII] shows the computational complexity required to achieve the performance gain from 3 dB 
to 12 dB. It can be seen that, if n is moderate and if G is not too big, K is affordable to recover a 
significant fraction of SNR loss relative to ML decoding. 

To achieve near-ML performance, G should be approximately equal to the proximity factor. It is known 
that the real gap to ML decoding is much smaller the worse-case bounds ( fT3] ) and ([141 ). Thus, we can 
run simulations to estimate the gap, which is often less than 10 dB when n is not too large. Therefore, 
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TABLE III 

Required value of K to achieve gain G in randomized lattice decoding (the complexity excludes 

pre-processing) 



Gain in dB 


G 


A) 


K 


Complexity 


3 


2 


4n 


V 4en 


O (n5/2) 


6 


4 


2n 


2en 


O(n^) 


9 


8 


n 


(en)^ 


OK) 


12 


16 


n/2 


(en/2)* 


OK) 



near-ML performance is achievable for small to moderate values of n if the following condition is 
satisfied: 

PF^G = 8n/po, po>l. (31) 

Then, we can determine the list size K from (|27] ). 

We point out that Table Hill should be used with caution, as the estimate of G is optimistic. The real gain 
certainly cannot be larger than the gap to ML decoding. Moreover, the closer Klein's algorithm performs 
to ML decoding, the more optimistic the estimate will be. This is because the minimum distance alone 
does not precisely characterize the performance. 

C. Limits 

Random lattice decoding has its limits. Because equation ( |29l ) only holds when > 1, we must have 
G < 8n. Obviously IT — )• e^" as G 8n (i.e., pg 1)- From ^E^, k \/2n as po 1- That is, we 
can achieve bounded-distance decoding for k — )• \/2n at the complexity K — )• e^". Albeit its exponential 
complexity, this is actually more encouraging than Klein's original analysis of the complexity, which is 

0(n") for k = ^/E. 

On the other hand, if we do want to achieve G > 8n, randomized lattice decoding will not be useful. 
This is because Pq = I (A = 0) for K > e^", i.e., it reduces to uniform sampling. One can still apply 
Klein's choice p = n, but it will be less efficient than uniform sampling, even if K is super-exponential 
in n. Therefore, as PF — )• oo, random lattice decoding might be even worse than sphere decoding if one 
sticks to ML decoding. 
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V. IMPLEMENTATION 

In this Section, we address several issues of implementation. In particular, we propose an efficient 
implementation of Klein's decoder, extend it to complex-valued lattices, and to MMSE. 

A. Efficient Randomized Rounding 

The core of Klein's decoder is the randomized rounding with respect to discrete Gaussian distribution 
([T5] ). Unfortunately, it can not be generated by simply quantizing the continuous Gaussian distribution. A 
rejection algorithm is given in Exercise 3 of |32 | to generate a random variable with the discrete Gaussian 
distribution from the continuous Gaussian distribution, which is efficient only when the variance is large. 
From ([T5] ). the variance in our problem is less than l/logpg. From the analysis in Section IV, we 
recognize that can be large, especially for small K. Therefore, the implementation complexity can be 
high. 

Here, we propose an efficient implementation of random rounding by truncating the discrete Gaussian 
distribution and prove the accuracy of this truncation. Efficient generation of Q results in high decoding 
speed. 

In order to generate the random integer Q with distribution ( fTSl ). a naive way is to calculate the 
cumulative distribution function 

i<q 

Obviously, P{Q = q) = Fc^ril) — Fc,r{Q — 1)- Therefore, we generate a real-valued random number z 
that is uniformly distributed on [0, 1]; then we let Q = q if Fc^ril — 1) < ^ < Fc^^iq)- A problem is 
that this has to be done online, since Fc^r{q) depends on c and r. The implementation complexity can 
be high, which will slow down decoding. 

We now try to find a good approximation to distribution (fTSl ). Write r = [rj + a, where < a < 1. 
Let b = 1 — a. Distribution ( fTSl ) can be rewritten as follows 

f e-^^'+^^'/s, q=\r\-i 
P{Q = q)={ ' ' ^ ^ ^ (33) 

[ e-^^+^y' /s, q=[r\+l+i 

where i > is an integer and 

Because A = log p/ minj ||bj |p, for every invocation of Rand_Roundc (r), we have c > log p. We use this 
bound to estimate the probability that r is rounded to the 2A^-integer set { [rJ — + 1,..., [rJ [rJ + A^}. 
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Now the probability P2N that q is not one of these 2N points can be bounded as 



i>N 



< ( 1 + p'{2N+l) ^ p-{m+A) . . . >| . 



-c{a+NY _^ ^~c(b+N) 



-c{a+NY _^ g-c(fe+Ar) 



/s 



/S. 



(34) 



2 l2 

But, since s > e~™ and s > e~ , we have 

< 2(l + 0(p-(2^+i)))e-^^'= 



Hence 



(35) 



(36) 



Since p > 1, the tail bound (1351) decays very fast. Consequently, it is almost sure that a call to 
Rand_Roundc (r) returns an integer in {[rj — N + l,...,[r\,...,[r\ + N} as long as N is not too small. 
Therefore, we can approximate distribution (fTSl ) by 2A^-point discrete distribution as follows. 



'Is' 
"Is' 



q = [rJ 
q=[r\+l 



(37) 



where 



AT-l 



i=0 



Fig. 2 shows the distribution ([TSl l. when r = —5.87 and c = 3.16. The distribution of Q tends to 
concentrate at [rJ = —6 and [rJ + 1 = —5 with probability 0.9 and 0.08 respectively. Fig. 3 compare the 
bit error rates associated with different for an uncoded 10 x 10 {nr = nR = 10) system with K = 20. 
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Fig. 2. Distribution of Q for r = -5.87 and c = 3.16. P(Q = -7) = 0.02, P[Q = -6) = 0.9 and P(Q = -5) = O.i 
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Fig. 3. Bit error rate vs. average SNR per bit for a 10 x 10 uncoded system using 64-QAM. 



It is seen that the choice of = 2 is indistinguishable from larger N. In fact, it is often adequate to 
choose a 3-point approximation as the probability in the central 3 points is almost 1. 

The following lemma provides a theoretical explanation to the above observations. 

Lemma 3: Let D {D{i) = P{Q = i)) be the non-truncated discrete Gaussian distribution, and D' be 
the truncated 2A^-point distribution. Then the statistical distance between D and D' satisfies: 

A{D, = ^ E - = Oip-""'). 
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Proof: By definition of D' , we have: 

i<[r\-N+l i>[r\+N 
[r\+N 

j=[rJ-Af+l 
i<[rJ-Af+l i>[r\+N 

where s = Ei>o(^~''^'''^*-'' + e-^(*+*)') and s' = E£o^(e~''^"'^*^' + e-^(''+*)'). The result then derives 
from (I35]|. ■ 



As a consequence, the statistical distance between the tuples of distributions used by K calls to 
Klein's algorithm corresponding to the non-truncated and truncated Gaussians is 0{nK p"^'). An im- 
portant property of the statistical distribution is that an algorithm behaves similarly if fed two nearby 
distributions. More precisely, if the output satisfies a property with probability p when the algorithm uses 
a distribution Di, then the property is still satisfied with probability > p — A(Di, D2) if fed D2 instead 
of Di (see El Chap. 8]). 

B. Complex Randomized Lattice Decoding 

Since the traditional lattice formulation is only directly applicable to a real-valued channel matrix, 
the randomized lattice decoding was given for the real-valued equivalent of the complex-valued channel 
matrix. This approach doubles the channel matrix dimension and may lead to higher complexity. From 
the complex lattice viewpoint |[34l . we study the complex randomized lattice decoding. The advantage of 
this algorithm is that it reduces the computational complexity by incorporating complex LLL reduction 
El. 

Due to the orthogonality of real and imaginary part of the complex subchannel, real and imaginary 
part of the transmit symbols are decoded in the same step. This allows us to derive complex randomized 
lattice decoding by performing randomized rounding for the real and imaginary parts of the received 
vector separately. 

In this sense, given the real part of input y, the randomized lattice decoding returns real part of z with 
probability 

p m (z)) > g-A|i5i(y)-«(z)|P_ (3g) 

\{i<n<MW) 
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Similarly, given the imaginary part of input y, the randomized lattice decoding returns imaginary part of 
z with probability 

Pi^iz)) > g-A|lS>(y)-3(z)|P_ (39) 

Ui<nSiA\\hi\\^) 

By multiplying these two probabilities, we get a lower bound on the probability that the complex 
randomized lattice decoding returns z 



P(z) = P(K(z)) •P(9(z)) 

1 



> 1 ^ ^-Am(y)-m^W+\\^(y)-^izW) 



e-My-^\\\ (40) 



U^<nS'{A\\h,f) 

Let A = logp/ miiij ||bj|p, where p > I. Along the same line of the analysis in the preceding Section, 
we can easily obtain 

P(z) > e"^"/^ • p-|ly-^IIVmmi<,<„ ||b,f ^ (4j) 

Given K calls, inequality (|4T] ) implies the choice of the optimum value of p: 

K = (e/5o)'"/'° , (42) 
and minimum distance of complex randomized lattice decoding 

C^Sin.Random ~ \ — ll^i II • (43) 

Let us compare with the 2n-dimensional real randomized lattice decoding 

/ 4:77/ 

C^min,Random " A / " 1 1 1 1 ■ (44) 



We have 



"Tnin, Random "Tnin, Random \^-' J 



which means real randomized lattice decoding and complex randomized lattice decoding have the same 
decision region. They also have the same parameter A for the same K. 

C. MMSE-Based Randomized Lattice Decoding 

The MMSE detector takes the SNR term into account and thereby leading to an improved performance. 
As shown in lITTl . MMSE detector is equal to ZF with respect to an extended system model. To this end. 
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B 



and y 



y 

0„,i 



we define the (m + n) xn extended channel matrix B and the (m + n) x 1 extended receive vector y by 

B 

This viewpoint allows us to incorporate the MMSE criterion in the real and complex randomized lattice 
decoding schemes. 

D. Other issues 

Each call to Rand_SIC incurs 0{n?) complexity only. Thus, the complexity of randomized lattice 
decoding is 0{Kv?), excluding pre-processing (lattice reduction and QR decomposition). Meanwhile, 
randomized lattice decoding allows for fully parallel implementation, since the samples can be taken 
independently from each other. Thus the decoding speed could be as high as that of a standard lattice- 
reduction-aided decoder if it is implemented in parallel. 

Since Klein's decoding is random, there is a small chance that all the K samples are further than the 
Babai point. Therefore, it is worthwhile always running Babai's algorithm in the very beginning. 

The call can be stopped as soon as the nearest sample point found has distance < ^ mini<j<„ ||bj||. 

VI. SIMULATION RESULTS 

This section examines the performance of randomized lattice decoding. We assume perfect channel 
state information at the receiver. For comparison purposes, the performances of Babai's decoding, lattice 
reduction aided MMSE-SIC decoding and ML decoding are also shown. 

Fig. 3 shows the bit error rate for an uncoded system with nx = nji = 10, 64-QAM and LLL 
reduction (5=0.99). Observe that even with 15 samples (G = 3 dB), the performance of the real Klein's 
decoding enhanced by LLL reduction is considerably better (by 2.4 dB) than that of Babai's decoding. 
MMSE-based real Klein's decoding can achieve further improvement of 1 dB. We found that K = 25 
(G = 4 dB) is sufficient for Real MMSE-based Klein's decoding to obtain near-optimum performance for 
uncoded systems with ut = < 10; the SNR loss is less than 0.5 dB. The complex version of MMSE 
Klein's decoding suffers about 0.2 dB loss at a BER of 10^^ when compared to the real version. Note 
that the complex LLL algorithm has half of the complexity of real LLL algorithm. At high dimensions, 
the real LLL algorithm seems to be slightly better than complex LLL, although their performances are 
indistinguishable at low dimensions 1341 . 

Fig. 5 and Fig. 6 show the achieved performance of randomized lattice decoding for the 2x2 Golden 
code [35] using 16-QAM and 4x4 Perfect code using 64-QAM [8|. The decoding lattices are of dimension 
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Fig. 4. Bit error rate vs. average SNR per bit for the uncoded 10 x 10 system using 64-QAM. 

8 and 32 in the real space, respectively. In Fig. 5, the real MMSE-based Klein decoder with K = W 
(G = 3 dB) enjoys 2-dB gain. In Fig. 6, the complex MMSE-based Klein decoder with K = 20 {G = 3 
dB), K = 71{G = 5 dB) and = 174 (G = 6 dB) enjoys 3-dB, 4-dB and 5-dB gain respectively. It again 
confirms that the proposed randomized lattice decoding bridges the gap to ML performance. Reference 
|[T9l proposed a decoding scheme for the Golden code that suffers a loss of 3 dB with respect to ML 
decoding, i.e., the performance is about the same as that of LR-MMSE-SIC. These experimental results 
are expected, as LLL reduction has been shown to increase the probability of finding the closest lattice 
point. Also, increasing the list size K available to the decoder improves its performance gain. Varying 
the number of samples K allows us to negotiate a trade-off between performance and computational 
complexity. 

Fig. |7] compares the average complexity of Babai's decoding, Klein's decoding and sphere decoding 
for uncoded MIMO systems using 64-QAM. The channel matrix remains constant throughout a block 
of length 10 and the pre-processing is only performed once at the beginning of each block. For the 
preprocessing, the effective LLL reduction has average complexity 0{n^ log n) |[36l . and the LLL algo- 
rithm can output the matrices Q and R of the QR decomposition. It can be seen that the average flops 
with Klein's decoding increases slowly with the dimension, while the average flops of sphere decoding 
is exponential in dimension. The computational complexity gap between Klein's decoding and Babai's 
decoding is nearly constant for fixed G. 
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Fig. 5. Bit error rate vs. average SNR per bit for the 2x2 Golden code using 16-QAM. 
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Fig. 6. Bit error rate vs. average SNR per bit for the 4x4 perfect code using 64-QAM. 



VII. CONCLUSIONS 

In this paper, we studied sampling-based randomized lattice decoding where the standard rounding 
in SIC is replaced by random rounding. We refined the analysis of Klein's algorithm and applied it to 
uncoded and coded MIMO systems. In particular, given the number of samples K, we derived the optimum 
parameter A to maximize the pseudo minimum distance (imin.Random> thereby optimizing the performance 
of Klein's randomized decoding algorithm. For fixed performance gain, we proved that the value of K 
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Fig. 7. Average number of floating-point operations for uncoded MIMO at average SNR per bit = 17 dB. Dimension n = 

retains the polynomial complexity of the decoding scheme. We also proposed an efficient implementation 
of random rounding which exhibits indistinguishable performance, supported by the statistical distance 
argument for the truncated discrete Gaussian distribution. The simulations conducted verified that the 
performance of the proposed randomized decoding is superior to that of Babai's decoding. With the new 
approach, a significant fraction of the gap to ML decoding can be recovered for practical values of K. 
It is particularly easy to recover the first 3 dB loss of Babai's decoding, which needs 0{^/n) samples 
only. The computational structure of the proposed decoding scheme is straightforward and allows for an 
efficient parallel implementation. 
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